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1  Summary 

In  this  final  report,  we  summarize  the  algorithms  developed  and  some  results  for  hyperspectral  imagery  over 
the  period  of  the  contract.  It  includes  target  detection,  endmember  detection,  and  change  detection.  The 
change  detection  algorithm  has  been  tested  on  a  newer  data  set  provided  by  the  AFRL,  with  good  results.  A 
new  total  variation(TV)  based  method  for  the  unmixing  of  hyperspectral  data  through  compressed  sensing 
is  introduced.  It  allows  unmixing  directly  from  the  compressed  sensing  data  without  first  reconstructing 
the  entire  hyperspectral  cube.  A  nonnegative  matrix  factorization  and  completion  algorithm  is  presented 
which  allows  the  reconstruction  of  partially  observed  or  corrupted  hyperspectral  data.  As  a  spinoff  of  our 
hyperspectral  effort  we  propose  a  novel  video  rate  IR  multispectral  imaging  system  which  achieves  higher 
accuracy  at  lower  cost  for  offshore  oil  spill  sensing. 

2  Target  Detection 

We  summarize  our  target  detection  algorithm  as  follows.  Given  a  hyperspectral  image  (HSI)  with  NxN  pixels 
and  M  spectral  bands,  we  wish  to  locate  the  positions  of  pixels  that  correspond  to  a  given  spectral  signature 
/,  which  also  has  M  spectral  bands.  We  rearrange  A  as  an  M  x  N 2  matrix,  where  generally  M  <  N2.  The 
signals  cq  are  the  columns  of  this  spectral  matrix  A  and  correspond  to  each  pixel  in  the  image. 

Our  goal  is  to  find  u  G  RN  by  solving  the  constrained  minimization  problem 

u  =  argmin\u\1  s.t.  \\Au  —  f\\  <  S,  (  . 

u  >  0,  W 

where  S  is  a  measure  of  the  noise  in  the  system.  This  is  an  extremely  underdetermined  system,  but  we  seek 
solutions  u  whose  components  ui  ideally  vanish  when  the  ith  column  ai  is  not  a  match  for  /,  and  U{  >  0 
when  cti  is  a  match. 

To  solve  this,  we  apply  Bregman  iteration  [1,  2],  by  approximately  solving  a  sequence  of  unconstrained 
minimization  problems. 


Un+1  =  argmin  (g  1^  +  ^  \\Au  -  /n||2^  (2a) 

r  =  r~x + /  -  Aun~x  (2b) 

for  n  =  1,  2, ...,  with  ^°  =  0.  The  constant  A  is  usually  chosen  around  A  =  \\at°A\\  •  || Au  —  fn ||2  monotonically 
decreases  to  zero  and  un  converges  very  quickly  to  a  solution  of  (1)  with  S  =  0,  see  [2,  1].  The  sparsity  of 
the  solution,  which  is  equivalent  to  the  sensitivity  of  the  matching,  is  controlled  by  the  parameter  (i. 
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It  now  becomes  a  matter  of  solving  (2a)  and  (1)  efficiently.  We  propose  the  recently  developed  Split 
Bregman[3]  algorithm.  The  idea  behind  this  is  quite  simple.  There  are  two  simple  minimization  problems 
to  be  solved.  To  solve 

(3) 


argmvn  y/i  \u\1 

we  have  the  following  well  known  shrinkage  formula 

Ui  =  shrink(fi,  n)  = 
Moreover,  if  we  add  the  constraint  that  iq  >  0,  then 


>- 

/II2) 

fi-  n 

if  fi  >  H 

0 

if \fi\  <  M 

fi  +  M 

if  fi  <  — M 

fi~  M 

if  fi  >  H 

0 

if  fi  <  M 

(4) 

Ui  =  stirink  '  [fi,  ii)  =  <{  T  '  . r  V  W  (5) 

(6) 

(7) 

The  idea  behind  split  Bregman  is  as  follows:  We  replace  the  problem  (2a)  by  a  sequence  of  approximations 
generated  by  Bregman  iteration: 


To  solve 


for  a  fixed  vector  d,  we  have 


=  shrink+(fi,n)  =  (  ^ 


argmin  (  ^  \\Au  -  ff  +  ^\\ d-  uf 


=  ( XAtA  +  I)  1  ( XATf  +  d ) 


f  ( dk+1 ,  Uk+1 )  =  argmin/j  \d\x  +  f  \\Au  -  f\\2  +  \\\d-U-  bk\\2 
\  bk  =  bk_1  +Uk  -  dk~x 

The  steps  used  in  the  solution  for  (8a)  and  (8b)  involve  splitting 

Uk+1  =  ( XATA  +  I) _1  ( XATfn  -bk  +  dk ) 

gk+i  _  shrink  ([//c+1  4-  6/c+1,  /i) 

Uk  approaches  un+1  monotonically,  ^Uk  —  un\\  \  0,  and  of  course,  ||d  —  Uk\\  \  0.  Thus  we  use  an  inner 
iteration  to  obtain  the  sequence  Uk,dk ,  which  approximates  the  updated  u.  We  then  update  using  (2)  to  get 
/n+1  and  repeat  the  inner  iteration  to  get  un+2 .  This  procedure  is  very  efficient.  The  number  of  of  inner 
iterations  needed  is  problem  dependent,  but  usually  between  5  and  10. 


(8) 


(9) 


2.1  Incorporating  Spatial  Information  in  the  Target  Detection  Process 

Our  LI  target  detection  method  described  above,  as  well  as  most  other  methods,  does  not  directly  take  into 
account  the  spatial  relationship  between  pixels.  Once  the  hyper  spectral  data  cube  has  been  converted  to 
a  matrix,  the  relationship  between  the  spectral  pixels  of  the  HSI  is  ignored.  A  more  sophisticated  model 
would  take  this  spatial  information  into  account.  We  will  describe  a  simple,  yet  effective  method  of  using 
the  spatial  information  to  increase  the  accuracy  of  target  detection.  The  idea  is  to  apply  TV  denoising  [4]  to 
the  output  of  the  LI  target  detection  algorithm.  It  is  reasonable  to  assume  that  in  a  natural  scene,  a  target 
is  composed  of  several  neighboring  pixels.  Therefore,  it  seems  likely  that  a  single  pixel  identified  as  a  target, 
but  surrounded  by  non-target  pixels  is  a  false  alarm.  Conversely,  a  pixel  identified  as  a  non-target  (or  with  a 
small  return  value)  surrounded  by  target  pixels  is  likely  missed  target  pixel.  If  spatial  information  is  used, 
such  errors  can  be  reduced  or  eliminated.  This  is  especially  true  for  the  LI  target  detection  output.  Due  to 
the  sparsity  of  the  method,  most  non-targets  have  a  zero  value,  and  isolated  false  alarm  pixels  are  usually 
eliminated  by  the  TV  denoising  algorithm. 
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2.1.1  TV  Denoising 

Here  we  briefly  describe  the  total  variation  denoising  model [4]  we  use  in  the  above.  Given  an  image  I  G  R2, 
we  solve  the  following  LI  minimization  problem  to  denoise  the  image: 

minv\Vv\t+^\\v  -  I\\l , 

where  7  is  an  adjustable  fidelity  parameter.  The  Split  Bregman  method  as  described  in  [3]  is  used  to  solve 
this  minimization  problem  very  efficiently. 

2.2  Experimental  Results 

To  test  our  target  detection  algorithm,  we  used  data  that  was  provided  by  the  AFRL.  This  data  set  has 
300x600  spatial  pixels,  124  spectral  bands  and  consists  of  4  panels  of  different  material  or  paint,  with  grass 
in  the  foreground  and  trees  in  the  background.  In  order  to  compute  ROC  curves,  the  ground  truth  must 
be  known.  We  chose  the  panels  as  the  4  targets,  but  since  the  ground  truth  was  not  provided,  we  slightly 
modified  the  data  set.  The  boundaries  of  a  panel  were  replaced  with  random  background  pixels,  leaving  the 
center,  which  we  could  say  for  certain  was  the  target  and  the  ground  truth.  Figure  1  shows  modified  RGB 
of  the  data.  We  tested  the  LI,  LI  with  TV,  Matched  Filter,  and  ACE  on  the  modified  data  set  to  obtain 
ROC  curves  for  each  of  the  4  targets.  Figures  2-5  show  the  results.  All  of  the  methods  were  able  to  detect 
the  last  3  panels,  with  no  false  alarms(using  the  proper  thresholding),  however  on  the  first  panel,  LI  with 
TV  performed  the  best.  We  scaled  the  output  of  the  LI  target  detection  to  [0,255]  before  applying  the  TV 
denoising.  This  was  done  so  a  consistent  fidelity  parameter  could  be  used.  All  of  the  target  detection  results 
below  used  the  value  7  =  0.05. 

To  test  robustness  to  noise,  we  also  added  some  random  noise  to  the  data  and  performed  the  same  tests 
with  results  shown  in  Figures  6-9.  On  the  first  panel,  the  LI  and  LI  with  TV  methods  performed  much 
better  than  Matched  Filter  and  ACE.  However,  on  the  second  panel,  Matched  Filter  and  ACE  had  better 
performance  than  either  of  the  LI  methods.  The  LI  with  TV  edged  out  the  others  for  the  3rd  panel  in  terms 
of  detecting  all  of  the  target  pixels  with  the  fewest  false  alarms.  LI  with  TV  was  able  to  detect  all  of  the 
target  pixels  for  the  4th  panel  with  no  false  alarms. 


3  Endmember  Detection 


Our  endmember  detection  algorithm  is  based  on  the  work  done  in  [5],  and  we  summarize  it  as  follows.  Let 
/  be  an  HSI  with  NxN  pixels  and  M  spectral  bands.  We  set  A  as  an  M  x  N 2  matrix  whose  columns  are 
the  spectral  vectors  of  I.  We  want  to  find  U  G  RN  xN  ,  referred  to  as  the  abundance  matrix,  such  that 
AU  ~  A.  In  order  to  minimize  the  number  of  endmembers,  we  need  to  minimize  the  number  of  nonzero 
rows  of  U.  Additionally,  to  make  each  pixel  a  sparse  nonnegative  linear  combination  of  the  endmembers,  we 
need  to  minimize  the  number  of  nonzero  entries  in  each  row,  i.e.  make  the  matrix  sparse.  In  practical  terms, 
the  matrix  U  is  extremely  large  and  solving  for  it  would  nearly  impossible.  In  order  to  make  the  problem 
manageable,  we  need  to  reduce  the  number  of  candidate  endmembers  by  choosing  a  subset  of  the  columns 
of  A  to  make  a  much  smaller  matrix  As  of  size  M  x  P,  where  P  «  N2.  We  then  try  find  Us  G  RPxP 
such  that  ASUS  «  As.  One  method  of  obtaining  As  is  to  cluster  the  pixels  of  A  and  choose  a  representative 
subset  from  each  cluster.  The  problem  is  then  stated  as 


min 

us>  0 


C^ma x(Us  itj)  +  (<rcw, 

i 


Us)  +  §  \\{AsUa 


Us)CwfF. 


(10) 


The  data  fidelity  term  is  |  \\(ASUS  —  US)CW ||^,  ||-||F  is  the  Frobenius  norm,  and  Cw  is  a  diagonal  matrix 
used  to  weight  the  columns  of  ( ASUS  —  Us)-  Since  As  is  composed  of  representative  members  of  clusters,  we 
want  give  those  from  larger  clusters  more  weight.  The  term  (  JV  ma Xj(Us  ij )  promotes  the  rows  of  Us  to  be 
zero,  and  (crcw,  Us)  encourages  the  sparsity  ofUs-  cr  is  a  P  x  P  matrix  of  weights  and  the  weights  are  chosen 

/  -[1-(ATXs)^?]2  \ 

as  aij  =  v  I  1  —  e  2^2  ) .  This  choice  of  weights  promotes  the  sparsity  of  Us.  We  solve  (10)  by  using 

split  Bregman  algorithm  which  is  related  to  the  alternating  direction  method  of  multipiers (ADMM  [6]). 
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Panel  1  Data 


Panel  2  Data 


Panel  3  Data 


Panel  4  Data 


Figure  1:  RGB  of  the  modified  scenes  for  the  4  panels 

3.1  Experimental  Results 

The  test  HSI  is  the  URBAN  data,  which  has  307x307  spatial  pixels  and  163  spectral  bands.  Figure  10  shows 
a  color  composite  of  the  scene.  Six  endmembers  were  computed,  using  parameter  values  (  =  2  and  [3  =  250. 
Once  the  endmembers  have  been  computed,  we  unmix  the  HSI  to  get  abundance  values  for  material.  We 
can  “cluster”  the  data  by  setting  the  material  with  the  largest  abundance  value  as  the  material  for  the  pixel. 
Figure  11  shows  a  composite  of  the  6  clusters  and  Figure  12  shows  the  clusters  plotted  individually.  By 
comparing  to  the  color  image  of  the  scene,  the  detected  clusters  appear  to  represent  asphalt,  trees,  grass, 
roofing  material,  dirt  and  some  other  type  of  vegetation. 
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Ground  Truth  for  panel  1 


Matched  Filter 


(a)  The  ground  truth  and  outputs  of  Matched  Filter,  LI,  and  LI  after  TV. 


LI  (jll=  1)  vs  Matched  Filter  vs  ACE 


CD 

~o 


O 


_Q 

cd 

-Q 

O 


log(False  alarm  rate) 

(b)  ROC  curves 


Figure  2:  Detecting  the  1st  panel. 
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Ground  Truth  for  Panel  2 


Matched  Filter 


(a)  The  ground  truth  and  outputs  of  Matched  Filter,  LI,  and  LI  after  TV. 


LI  (jll=  1)  vs  Matched  Filter  vs  ACE 


Figure  3:  Detecting  the  2nd  panel. 
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Ground  Truth  for  Panel  3  Matched  Filter 


(a)  The  ground  truth  and  outputs  of  Matched  Filter,  LI,  and  LI  after  TV. 


LI  (jll=  5)  vs  Matched  Filter  vs  ACE 


Figure  4:  Detecting  the  3rd. 
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Ground  Truth  for  Panel  4 


LI  after  TV  regularization 


(a)  The  ground  truth  and  outputs  of  Matched  Filter,  LI,  and  LI  after  TV. 


LI  (M-=  1 ) 


Matched  Filter 
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Figure  5:  Detecting  the  4th  panel. 
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Ground  Truth  for  Panel  1 


Matched  Filter 


L1(|i=  0.01)  LI  after  TV  regularization 


(a)  The  output  of  the  4  methods 


LI  (n=  0.01 )  vs  Matched  Filter  vs  ACE 


log(False  alarm  rate) 


(b)  ROC  curves 


Figure  6:  Detecting  the  1st  panel  on  data  with  noise. 


9 


Ground  Truth  for  Panel  2 


Matched  Filter 


L1(n=  0.0075) 


LI  after  TV  regularization 


(a)  The  output  of  the  4  methods 


LI  (jn=  0.0075)  vs  Matched  Filter  vs  ACE 


Figure  7:  Detecting  the  2nd  panel  on  data  with  noise. 
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Ground  Truth  for  Panel  3  Matched  Filter 


(a)  The  output  of  the  4  methods 


LI  {\x=  0.3)  vs  Matched  Filter  vs  ACE 


10-4  10-2  10° 
log(False  alarm  rate) 


(b)  ROC  curves 


Figure  8:  Detecting  the  3rd  panel  on  data  with  noise. 
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Ground  Truth  for  Panel  4 


Matched  Filter 


L1(n=  0.0005) 


LI  after  TV  regularization 


(a)  The  output  of  the  4  methods 


LI  (jll=  0.0005)  vs  Matched  Filter  vs  ACE 


Figure  9:  Detecting  the  4th  panel  on  data  with  noise. 
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Figure  10:  Color  composite  of  the  scene  for  the  URBAN  data. 


Figure  11:  Clustering  of  the  6  materials. 
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Figure  12:  The  6  clusters  that  appear  to  be(starting  from  top,  left  to  right)  asphalt,  trees,  grass,  roofing 
material,  dirt  and  some  other  type  of  vegetation. 
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4  Change  Detection 

In  this  report  we  apply  our  change  detection  algorithm  to  a  new  data  set.  First,  we  summarize  the  problem 
and  steps  of  the  method.  Given  two  hyperspectral  images  I\  and  I2  of  the  same  scene  taken  at  different 
times,  we  want  to  determine  any  changes  or  differences  between  them.  The  first  step  is  to  obtain  a  set  of 
endmembers  for  both  HSIs  simultaneously.  This  is  done  using  our  endmember  detection  algorithm  presented 
in  previous  reports,  and  it  chooses  pixels  of  the  HSIs  as  endmembers.  By  using  both  HSIs  in  the  endmember 
detection,  we  will  get  some  endmembers  from  the  I\  and  the  rest  from  I2 .  Let  {e{,  e\,  •  •  •  eN±}  be  the  Ni 
endmembers  chosen  from  I\  and  \e\ ,  e|, . . .  }  be  the  N2  endmembers  chosen  from  I2.  Using  the  combined 

set  of  endmembers  and  the  unmixing  algorithm  we  developed  in  previous  reports,  we  unmix  each  pixel  of  each 
HSI  obtaining  an  abundance  vector  for  each  pixel.  The  components  of  the  abundance  vectors  correspond 
to  the  endmembers.  One  measure  of  change  is  to  compute  the  angle  between  the  abundance  vectors  of  I\ 
and  I2  •  If  there  was  a  change  in  the  materials  of  a  given  pixel,  it  is  expected  that  the  difference  between 
abundance  vectors  would  indicate  that  change. 

Although  this  method  can  work  well  for  certain  data  sets,  we  have  found  that  shadows  present  in  one 
image  but  not  present  in  the  other,  are  often  detected  as  a  change.  We  observed  that  the  shadowed  regions  are 
usually  detected  as  an  endmember  by  the  endmember  detection  algorithm.  Additionally,  a  similar  material 
that  is  not  under  a  shadow  is  often  chosen  as  an  endmember  as  well.  We  observed  that  if  the  components  of 
the  abundance  vectors  corresponding  to  these  two  endmembers  representing  similar  materials  under  different 
lighting  conditions  are  combined,  we  can  drastically  reduce  the  detection  of  shadows  as  change.  Through 
numerical  experimentation,  we  have  developed  a  method  that  appears  to  help  to  determine  which  components 
of  the  abundance  vectors  to  combine.  Essentially,  we  look  at  the  angle  between  pairs  of  endmembers,  one 
from  each  HSI.  The  smaller  the  angle,  the  more  similar  we  consider  the  materials  to  be.  We  have  found 
that  taking  an  average  of  the  pixels  around  the  chosen  endmembers  produced  better  results.  This  may  be 
because  the  endmember  detection  algorithm  tries  to  pick  pixels  that  are  different  from  each  other.  Therefore 
if  a  pixel  of  a  given  material  is  chosen,  it  is  likely  to  be  fairly  different  from  the  average  signature  of  that 
material.  The  averaging  process  makes  it  easier  to  determine  the  similarity  of  materials,  and  thus  is  more 
useful  in  determining  which  components  of  the  abundance  vector  to  combine.  In  previous  reports,  we  did 
experiments  to  determine  the  effect  of  the  patch  size  that  is  used  to  get  the  average  endmember  signature, 
and  we  found  that  the  size  is  not  that  critical.  Of  course,  this  patch  size  may  depend  on  the  spatial  resolution 
of  the  image,  but  a  patch  size  of  5x5  has  worked  well  for  all  of  the  data  sets  we  have  tried.  This  is  a  simple 
approach,  and  has  produced  good  results,  but  other  methods  of  determining  which  endmembers  are  similar 
may  work  better.  One  example  would  be  to  use  a  library  of  spectral  signatures  of  materials  under  different 
lighting  and  atmospheric  conditions. 

Once  we  have  a  list  of  the  “closest”  endmembers,  we  need  to  determine  whether  or  not  to  combine  the 
corresponding  components  of  the  abundance  vectors.  Again  through  numerical  experiments  of  the  data  we 
have,  we  found  that  combining  averaged  endmembers  that  have  an  angle  less  than  0.1-0.15  gave  us  good 
results.  Further  experimentation  on  several  data  sets  should  give  a  better  idea  of  what  a  good  threshold 
value  is,  and  if  that  value  is  dependent  on  the  data  itself. 

4.1  Results  on  Airport  Data 

The  data  set  consists  of  two  hyperspectral  images  of  the  same  scene  taken  at  different  times.  They  have 
961x521  spatial  pixels  and  40  spectral  bands  each.  Figure  13  shows  the  first  band  of  each  HSI.  We  see  that 
almost  half  of  the  scene  is  runways  and  grass  where  there  does  not  appear  to  be  any  change  in  the  object 
present,  so  we  will  work  on  two  subsets  of  the  scene  where  more  change  has  taken  place.  Figure  14  is  a 
350x480  pixel  subset  of  the  image  that  includes  planes,  hangers,  and  other  small  objects.  In  these  figures, 
we  can  see  the  different  shadows  produced  by  taking  the  images  at  different  times  of  day.  Such  dark  and 
pronounced  shadows  make  change  detection  difficult,  as  they  are  often  detected  as  change  when  it  is  simply 
a  change  of  lighting  instead  of  an  actual  change  in  material. 

The  first  step  in  our  change  detection  algorithm  is  to  compute  endmembers,  and  Figure  15  shows  the 
locations  of  the  pixels  computed  as  endmembers.  Figure  16  shows  the  spectral  signatures  of  the  endmembers. 
Once  the  endmembers  are  obtained,  each  HSI  is  unmixed  and  an  abundance  vector  is  computed  for  each 
pixel.  The  angle  between  corresponding  abundance  vectors  are  then  computed.  Figure  17  shows  the  angle 
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between  the  abundance  vectors  from  each  HSI.  We  see  that  the  areas  of  dark  shadows  are  indicated  as 
areas  of  change.  Following  the  next  step  of  our  method,  we  compute  an  average  spectral  signature  for 
each  endmember  by  taking  an  average  of  a  5x5  patch  centered  on  the  endmember  pixel.  Figure  18  shows 
the  spectral  signatures  of  the  averaged  endmembers.  Then  the  angle  between  these  averaged  endmember 
signatures  was  computed,  and  the  components  of  the  abundance  vectors  corresponding  to  the  closest  averaged 
endmembers  are  combined.  Table  1  lists  the  3  closest  endmember  pairs  and  the  angles  between  them.  We 
see  that  the  5th  and  6th  are  the  closest  endmembers,  and  they  appear  to  be  shady  concrete  and  sunny 
concrete,  respectively.  Combining  the  5th  and  6th  components  of  the  abundance  vectors  and  computing  the 
angle  between  the  reduced  abundance  vectors  gives  us  the  result  in  Figure  19.  A  significant  amount  of  the 
shadows  produced  by  the  buildings  and  some  of  the  planes  has  been  reduced.  The  final  step  we  used  was  to 
apply  total  variation  denoising  to  incorporate  spatial  information,  and  that  result  is  shown  in  Figure  20. 


(a)  First  image 


(b)  Second  image 


Figure  13:  Plots  of  the  first  band  for  both  images 
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Endmember  from  first  HSI 

Endmember  from  second  HSI 

angle 

1 

Endmember  5 

Endmember  6 

0.0983 

2 

Endmember  3 

Endmember  7 

0.2386 

3 

Endmember  1 

Endmember  4 

0.3060 

Table  1:  Sorted  table  of  closest  averaged  7  endmembers. 


5  TV-based  Hyperspectral  Compressed  Sensing  and  Unmixing 

Hyperspectral  unmixing  is  to  decompose  each  pixel  spectrum  to  identify  and  quantify  the  relative  abundance 
of  each  endmember  (material),  which  typically  demands  enormous  computational  resources  in  terms  of 
storage,  computation  and  I/O  throughput,  especially  when  real-time  processing  is  desired.  Therefore,  it 
is  particularly  difficult  to  directly  process  and  analyze  hyperspectral  data  cubes  in  real  time  or  near  real 
time.  On  the  other  hand,  hyperspectral  data  are  highly  compressible  with  two- fold  compressibility:  1)  each 
spatial  image  is  compressible,  and  2)  the  entire  cube,  when  treated  as  a  matrix,  is  of  low  rank.  To  fully 
exploit  such  rich  compressibility,  we  propose  a  scheme  that  never  requires  us  to  explicitly  store  or  process 
any  hyperspectral  cube  itself.  In  this  CSU  (compressive  sensing  and  unmixing)  scheme,  data  are  acquired 
by  means  of  CS  and  directly  unmixed  without  reconstructing  the  underlying  data  cube. 

Mathematically  and  ideally,  the  hyperspectral  data  model  has  the  form 

X  =  HW ,  HlUe  =  lnp,  and  H  >  0,  (11) 

where  X,  H  and  W  are  the  matrix  representations  of  hyperspectral  cube,  abundance  fraction  and  signatures 
of  endmembers.  1  represents  a  vector  of  all  ones,  and  ne,  rib  and  nv  denote  the  number  of  endmembers,  bands 
and  pixels,  respectively.  Similar  to  other  CS  models,  we  consider  the  data  acquisition  model  AX  =  F  where 
A  G  Mmxnp  is  a  random-like  sensing  matrix  with  m  <  np.  To  directly  unmix  the  compressed  observation  F, 
we  proposed  to  solve  a  compressed  unmixing  model  (or  its  variants) 

ne 

minVT V(Hej)  s.t.  AHW  =  F,  Hln&  =  lnp,  (12) 

H  '  ^ 

3  =  1 

by  TVAL3’s  variation  specially  tailored  to  this  model.  When  the  endmember  spectral  signatures  W  is  given, 
we  also  propose  a  preprocessing  procedure  based  on  the  singular  value  decomposition  of  F  to  substantially 
reduce  the  problem  sizes,  and  enable  near-real-time  processing  speeds.  The  feasibility  of  the  proposed 
approach  has  been  demonstrated  by  a  test  using  compressed  hyperspectral  data  collected  by  hardware 
similar  to  the  single-pixel  camera  [7],  as  shown  in  Figure  21.  Our  target  image  is  the  color  wheel  on  the  left, 
composed  of  various  intensity  levels  of  three  colors:  yellow,  cyan  and  magenta.  The  abundance  fractions 
corresponding  to  the  three  endmembers  were  computed  from  10%  measured  data,  and  are  shown  on  the 
right  side  of  Figure  21.  The  computational  time  to  process  the  compressed  unmixing  was  about  26  seconds. 
As  we  can  see,  our  model  and  algorithm  have  accurately  detected  the  areas  corresponding  to  each  color  at 
various  levels  of  brightness. 

Four  slices  of  the  computed  hyperspectral  cube  (corresponding  to  4  different  spectral  bands),  obtained 
by  multiplying  the  estimated  abundance  matrix  H  with  W,  are  given  on  the  left  side  of  Figure  22.  For  a 
comparison,  we  applied  the  2D  TV  solver  TwIST  [8]  to  the  same  10%  measured  data  set  and  the  results  are 
given  on  the  right  side  of  Figure  22.  Clearly  the  results  of  CSU  are  much  better  than  the  TwIST  algorithm. 
Additionally,  the  computational  time  required  for  CSU  is  at  least  an  order  of  magnitude  less  than  that 
required  by  state-of-the-art  2D  TV  solvers. 

This  work  for  the  first  time  has  proved  the  concept  of  unmixing  a  hyperspectral  data  cube  without  the 
cube  itself.  We  have  also  obtained  results  on  “blind”  unmixing  where  information  of  endmember  signatures 
is  either  only  partially  known  or  severely  corrupted.  These  results  may  have  potential  impact  in  the  field  of 
hyperspectral  data  processing. 
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6  Nonnegative  matrix  factorization  and  completion 

6.1  The  model  and  an  ADM-based  algorithm 

Our  paper  [9]  focuses  on  the  optimization  problem 

min  \\Vn(XY  -  M)\\2f 

A ,  Y 

s.t.  X  G  Mmx<?,F  G  M<?xn,  (13) 

Xij  >  0 ,Yi:j  >  0,V  i,  j, 

where  Q  C  {1,  •  •  •  ,  ra}  x  {1,  •  •  •  ,  n}  indexes  the  known  entries  of  M  and  Vq(A)  returns  a  copy  of  A  that 
zeros  out  entries  not  in  fi.  We  call  this  problem  nonnegative  matrix  factorization/ completion  (NMFC) 
since  it  is  a  combination  of  nonnegative  matrix  factorization  (NMF)— which  finds  nonnegative  factors  of  a 
nonnegative  matrix  given  all  of  its  entries— and  low-rank  matrix  completion  (LRMC)— which  recovers  M  from 
an  incomplete  set  of  its  entries  without  assuming  nonnegativity. 

We  are  interested  in  NMFC  since  it  complements  NMF  and  LRMC.  NMF  has  been  widely  used  in  data 
mining  such  as  text  mining,  dimension  reduction  and  clustering,  as  well  as  spectral  data  analysis.  Unlike 
NMF,  NMFC  assumes  that  the  underlying  matrix  is  incompletely  sampled;  hence,  it  leads  to  saving  of 
sampling  time  and  storage  (for  data  such  as  images)  and  has  broader  applicability.  On  the  other  hand, 
LRMC  has  recently  found  a  large  number  of  applications  including  collaborative  filtering,  global  positioning, 
system  identification  and  order  reduction,  as  well  as  the  background  subtraction  and  structure-from-motion 
problems  in  computer  vision. 

A  rank-g  matrix  M  can  be  written  as  M  =  XY  for  matrices  X  with  q  columns  and  Y  with  q  rows.  When 
X  and  Y  are  known  to  be  nonnegative  a  priori ,  empirical  evidence  shows  that  imposing  nonnegativity  on  the 
factors  improves  the  recovery  quality.  In  particular,  in  certain  applications  such  as  hyper  spectral  unmixing, 
the  factors  are  nonnegative  due  to  their  physical  nature,  so  these  applications  will  benefit  from  NMFC.  To 
summarize,  NMFC  combines  NMF  and  LRMC,  and  NMFC  is  useful  when  the  underlying  matrix  has  both 
low  rank  and  nonnegative  factors. 

Since  problem  (13)  is  nonconvex,  finding  its  solution  becomes  quite  difficult.  Although  there  is  no 
theoretical  guarantee  of  a  solution,  we  observe  very  good  numerical  results  by  applying  the  alternating 
direction  method  (ADM)  to  NMF,  which  is  also  nonconvex. 

6.2  Numerical  results 

We  compared  our  algorithm  in  [9]  to  the  algorithm  proposed  in  [10],  which  takes  complete  samples  of  M  and 
performs  similar  ADM-based  iterations  on  random  matrices  with  varying  number  of  sampled  entries.  The 
recovery  qualities  and  speeds  are  illustrated  in  Figure  23. 

We  compared  our  algorithm  in  [9]  with  LMaFit  [11]  and  FPCA  [12]  on  recovering  three-dimensional 
hyperspectral  images  from  their  incomplete  observations.  Our  test  hyperspectral  datacube  has  163  slices, 
and  the  size  of  each  slice  is  80  x  80.  Three  selected  slices  are  shown  in  Figure  24.  The  three  algorithms 
were  compared  on  recovering  M  from  incomplete  observations  of  SR  =  30%,  40%,  50%,  and  their  results 
were  compared  in  terms  of  peak  signal-to-noise  ratio  (PSNR),  mean  squared  error  (MSE),  as  well  as  relative 
nonnegativity  feasibility  (FA).  The  results  are  given  in  table  2,  and  the  three  slices  of  the  recovered  data 
cube  that  correspond  to  those  in  Figure  24  are  given  in  Figure  25. 

6.3  A  Fast  Algorithm  with  Convergence  Guarantee 

The  method  proposed  in  our  recent  paper  [13]  closes  the  gap  between  the  above  ADM-based  algorithm, 
which  is  very  fast  but  lacks  convergence  guarantees,  and  ANLS-type  (alternating  nonnegative  least-squares) 
algorithms  [14,  15,  16],  which  are  widely  used  and  have  convergence  guarantees  but  are  comparatively  slow. 
The  new  method  is  based  on  applying  the  alternating  proximal  gradient  method  to 

min F(X,  Y)  =  \\\XY  -  M\\2F,  s.t.  X  e  K™*9,  Y  G  R+Xn,  (14) 
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Table  2:  Recovered  slices  by  Algorithm  [9],  LMaFit,  and  FPCA.  The  rank  estimate  for  Algorithm  [9]  and 
LMaFit  is  30. 


problem 

Alg  [9] 

LMaFit 

FPCA 

seed 

CPU 

PSNR 

MSE 

FA 

CPU 

PSNR 

MSE 

FA 

CPU 

PSNR 

MSE 

FA 

SR:  30% 

3445 

35.04 

47.52 

1.85e+l 

0 

40.39 

42.10 

6.45e+l 

8.57e-3 

38.49 

44.53 

3.69e+l 

1.12e-3 

31710 

34.77 

47.32 

1.94e+l 

0 

23.43 

43.46 

4.71e+l 

5.08e-3 

38.40 

44.42 

3.79e+l 

1.24e-3 

43875 

34.31 

47.42 

1.89e+l 

0 

38.27 

42.54 

5.83e+l 

7.70e-3 

38.64 

44.71 

3.54e+l 

1.21e-3 

69483 

34.19 

47.36 

1.92e+l 

0 

34.53 

42.74 

5.57e+l 

6.83e-3 

38.75 

44.62 

3.61e+l 

1.04e-3 

95023 

33.69 

47.42 

1.90e+l 

0 

27.88 

42.98 

5.27e+l 

6.19e-3 

38.78 

44.51 

3.70e+l 

1.19e-3 

SR:  40% 

3445 

36.84 

48.83 

1.37e+l 

0 

35.05 

43.90 

4.26e+l 

7.55e-3 

42.76 

44.72 

3.53e+l 

1.09e-3 

31710 

36.45 

48.66 

1.42e+l 

0 

24.45 

44.71 

3.54e+l 

5.48e-3 

42.57 

44.53 

3.69e+l 

1.06e-3 

43875 

36.61 

48.92 

1.34e+l 

0 

17.67 

46.07 

2.59e+l 

4.65e-3 

42.85 

44.66 

3.58e+l 

1.31e-3 

69483 

38.37 

48.68 

1.42e+l 

0 

19.13 

45.59 

2.89e+l 

5.52e-3 

42.88 

44.52 

3.70e+l 

1.06e-3 

95023 

38.65 

48.50 

1.48e+l 

0 

22.16 

45.30 

3.09e+l 

5.02e-3 

43.22 

44.56 

3.66e+l 

1.09e-3 

SR:  50% 

3445 

39.87 

49.74 

l.lle+1 

0 

34.31 

44.72 

3.53e+l 

9.62e-3 

47.12 

44.24 

3.94e+l 

8.55e-4 

31710 

39.77 

49.75 

l.lle+1 

0 

29.90 

45.23 

3.14e+l 

4.78e-3 

46.64 

45.25 

3.12e+l 

1.12e-3 

43875 

37.78 

49.78 

1.10e+l 

0 

25.53 

45.92 

2.68e+l 

6.12e-3 

46.63 

44.60 

3.63e+l 

1.44e-3 

69483 

38.24 

49.65 

1.14e+l 

0 

31.14 

44.92 

3.37e+l 

7.77e-3 

47.07 

44.39 

3.81e+l 

1.08e-3 

95023 

40.09 

49.64 

1.14e+l 

0 

29.92 

45.42 

3.00e+l 

5.64e-3 

47.90 

43.93 

4.24e+l 

1.16e-3 

where  for  presentation  brevity  we  have  let  Pq  =  I  in  (13)  and  M™xn  =  {X  E  Mmxn  :  Xij  >  0, 1  <  i  < 
m,  1  <  j  <  n}.  Specifically,  proximal  gradient  steps  are  applied  alternatively  to  the  X-subproblem  (15a) 
and  F-subproblem  (15b): 

=  arg  min  hxY^1  -  M\\2F,  (15a) 

y(.  ^0  A 

Yk  =  argmm  i||XfeF  —  M\\%,  (15b) 

respectively.  Our  method  uses  simple  updates  at  each  iteration  and  has  fast  convergence  like  the  ADM- 
based  algorithm.  In  addition,  it  decreases  the  objective  monotonically  and  has  provable  convergence  like  the 
ANLS-type  algorithms.  Under  some  mild  assumptions,  we  prove  convergence  and  estimate  the  convergence 
rate  by  applying  the  Kurdyka-Lojasiewicz  inequality  [17,  18]. 

6.3.1  Proximal  gradient  method 

Consider  the  convex  optimization  problem 

min /(#),  (16) 

x£X 

where  A  is  a  convex  set  and  /  is  a  differentiable  convex  function.  Assume  /  has  Lipschitz  continuous 
gradient,  i.e.,  there  is  a  constant  Lf  >  0  such  that 

||V/(x)  -  V/(y)||2  <  Lf\\x  -  y\\2,  for  all  x,y€X. 

The  proximal  gradient  method  for  solving  (16)  is 

xk+ 1  =  Vx(xk  -  Vf(xk)/Lf),  for  k  =  0, 1,  •  •  •  (17) 

where  Vx  denotes  the  projection  to  X.  In  (17),  xk+1  is  the  solution  of 

min  f(xk)  +  (V  f(xk),x  —  xk)  +  ^-\\x-  xk\\l  (18) 

xEX  Z 

With  an  appropriate  choice  of  xk,  this  method  is  guaranteed  to  converge  to  a  solution  of  (16). 

Applying  this  method  to  our  subproblem 

mmh{W)rnl-\\AW-B\\2F ,  (19) 
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Algorithm  1  APG:  Alternating  proximal  gradient  method  for  NMF 
Input:  m  x  n  nonnegative  matrix  M  and  pre-determined  dimension  q 

Initialization:  choose  a  positive  number  <L,  <  1  and  randomize  matrices  X~x  =  X°  >  0  and  Y~l  = 
F°  >  0 

for  k  =  0, 1,  2, ...  do 

Set  w*  ,  w*  according  to  (22)  and  let  Xk  =  Xk  +  ukx(Xk  -  Xk~x),  Yk  =  Yk  +  ujy(Yk  -  Yk~x). 

Update  (X/c+1,  F/c+1)  according  to  (21). 
if  F(Xk+1,Yk+1)  >  F(Xk,Yk)  then 
Update  (Xk+1,Yk+1)  according  to  (21)  with  Xk  =  Xk,Yk  =  Yk 

end  if 

if  some  stopping  criterion  is  met  then 
Stop  and  output  (Xfe+1,  Yk+1) 

end  if 
end  for 


we  obtain 

Wk+ 1  =  V+  (wk  -  \7h(Wk)/Lh )  ,  (20) 

where  L h  is  a  Lipschitz  constant  of  Vh(W).  Although  we  can  iterate  (20)  many  times  to  obtain  (15a)  and 
do  the  same  to  obtain  (15b),  we  propose  to  run  only  one  iteration  of  (20)  toward  (15a)  and  do  the  same 
toward  (15b).  This  way,  we  alternatively  update  X,Y  by 


xk+1  =p+  (xk  -  VxF(Xk,Yk)/LkxSj  ,  (21a) 

Yk+ 1  =  V+  (Yk  -  VYF(Xk+1,Yk)/L^  ,  (21b) 


where  Lkx  and  Ly  are  Lipschitz  constants  of  V xF{X,  Yk)  and  VyF(A/c+1,F)  with  respect  to  X  and  Y, 
respectively. 

In  our  algorithm,  we  take  Lkx  and  Ly  as  the  spectral  norm  of  Yk  and  Xfc+1,  respectively.  In  addition, 
we  set  Xk  =  Xk  +  cjx(Xk  —  Xk~x)  and  Yk  =  Yk  +  Uy(Yk  —  U^-1),  where  ujx^ujy  are  chosen  based  on 


the  Nesterov- type  extrapolation  weight.1 


For  ease  of  convergence  analysis,  we  impose  ujx 


and 


Jy  <  Su 


with  some  nonnegative  constant  5^  <  1.  Specifically,  we  set 


ujx  =  min 


ujy  =  min 


(22) 


where  &k  =  (tk-i  ~  l)/tfe  and  tk  =  \  ^1  +  y  1  +  with  initial  value  t- 1  =  1.  To  ensure  monotonic 

objective  decrease,  whenever  F(Xk+1,  Yk+1)  >  F(Xk ,  Yk),  we  redo  the  fcth  iteration  to  update  (Xk+1 ,  Yk+1) 
with  Xk  =  Xk ,  Yk  =  Yk.  The  method  is  summarized  in  Algorithm  1,  and  we  call  it  APG. 


6.3.2  Convergence  Guarantee 

Below  we  present  the  theorems  that  summarize  the  convergence  theory  given  in  [13]. 

Theorem  6.1  (Global  convergence).  Assume  that  the  sequence  {Zk}  generated  by  the  algorithm  APG  is 
uniformly  away  from  zero  and  bounded,  i.e.,  there  exist  positive  constants  £,  L  such  that  i  <  Lx,  Ly  <  L  for 

any  k  >  0.  In  addition,  assume  5^  <  for  some  nonnegative  S  <  1.  Then,  {Zk}  converges  to  a  critical 
point  of  (14)  from  any  starting  point  Z°. 

lrThe  dynamically  updated  extrapolation  weight  used  in  [19,  20]  also  works  well  for  our  algorithm. 
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APG 

ADM 

blockpivot 

Q 

relerr 

time 

relerr 

time 

relerr 

time 

20 

1.18e-2 

34.2 

2.34e-2 

87.5 

1.38e-2 

62.5 

30 

9.07e-3 

63.2 

2.02e-2 

116 

1.10e-2 

143 

40 

7.56e-3 

86.2 

L78e-2 

140 

9.59e-3 

194 

50 

6.45e-3 

120 

1.58e-2 

182 

8.e-3 

277 

Table  3:  Comparison  results  of  the  solutions  obtained  by  APG,  ADM  and  blockpivot  on  hyperspectral 
cube  of  size  150  x  150  x  163  with  dimension  q  varying  among  {20,  30, 40,  50} 


o 

iO 

II 

Si 

APG-MC 

ADM-MC 

o 

o 

T - 1 

II 

Si 

APG-MC 

ADM-MC 

SR 

PSNR 

MSE 

PSNR 

MSE 

SR 

PSNR 

MSE 

PSNR 

MSE 

0.20 

32.30 

5.89e-4 

28.72 

1.35e-3 

0.20 

32.57 

5.54e-4 

28.80 

1.33e-3 

0.30 

40.65 

8.62e-5 

33.58 

4.64e-4 

0.30 

41.19 

7.61e-5 

33.69 

4.52e-4 

0.40 

45.77 

2.66e-5 

38.52 

1.46e-4 

0.40 

46.03 

2.50e-5 

38.69 

1.41e-4 

Table  4:  Comparison  results  of  the  solutions  obtained  by  APG-MC  and  ADM-MC  on  a  hyperspectral  cube 
of  size  150  x  150  x  163  for  sample  ratio  SR  =  {0.20,  0.30,  0.40}  and  maximum  running  time  (sec)  T  =  50, 100. 


Theorem  6.2  (Convergence  rate).  Under  the  assumptions  of  Theorem  6.1,  suppose  that  Zk  converges  to 
some  critical  point  Z .  Then  there  exists  some  6  G  (0, 1)  such  that  the  following  estimations  hold: 

1.  If  6  G  (0,  \],  then  \\Zk  —  Z\\p  <  Crk  for  some  constants  C  >  0  and  r  G  [0, 1); 

2.  If  6  G  (|,  1),  then  \\Zk  —  Z\\p  <  Ck -C-G/C20-1)  for  some  constant  C  >  0. 

6.4  Hyperspectral  Simulation 

It  has  been  shown  in  [21]  that  NMF  can  be  applied  to  spectral  data  analysis.  In  [21],  a  regularized  NMF  model 
is  also  considered  with  penalty  terms  o||X|||,  and  /?||F|||.  added  in  the  objective  of  (14).  The  parameters 
a  and  f3  can  be  tuned  for  a  specific  purpose  in  practice.  Here,  we  focus  on  the  original  NMF  model  (14)  to 
show  the  effectiveness  of  our  algorithm.  However,  our  method  can  be  easily  extended  to  solve  the  regularized 
NMF  model.  In  this  test,  we  use  a  150  x  150  x  163  hyperspectral  cube  to  test  the  compared  algorithms. 
Each  slice  of  the  cube  is  reshaped  as  a  column  vector,  and  a  22500  x  163  matrix  M  is  obtained.  In  addition, 
the  cube  is  scaled  to  have  the  unit  maximum  element.  Four  selected  slices  are  shown  in  the  first  row  of 
Figure  26  corresponding  to  the  1st,  50th,  100th  and  150th  columns  of  M.  The  dimension  q  is  chosen  from 
{20,  30, 40,  50},  and  Table  3  lists  the  average  results  of  10  independent  trials.  We  can  see  from  the  table  that 
APG  is  superior  to  ADM  and  blockpivot  in  both  speed  and  solution  quality. 

We  also  compare  the  algorithm  APG-MC  and  the  ADM-based  algorithm  (ADM-MC)  in  [9]  on  the 
hyperspectral  data  used  in  last  test.  It  is  demonstrated  in  [9]  that  ADM-MC  outperforms  matrix  completion 
solvers  APGL  and  LMaFit  for  nonnegative  matrix  completion  problems,  because  ADM-MC  utilizes  the 
nonnegativity  property  of  the  data  while  the  latter  two  do  not.  We  fix  the  dimension  q  =  40  in  (13)  and 
choose  sample  ratio:  SR  =  ^  fr0m  {0.20,0.30,0.40}.  The  parameter  5^  for  APG-MC  is  set  to  1,  and  all 
the  parameters  for  ADM-MC  are  set  to  their  default  values.  To  make  the  comparison  consistent  and  fair, 
we  let  both  of  the  two  algorithms  run  to  a  maximum  time  (sec)  T  =  50, 100,  and  we  employ  peak-signal- 
to-noise-ratio  (PSNR)  and  mean  squared  error  (MSE)  to  measure  the  performance  of  the  two  algorithms. 
Table  4  lists  the  average  results  of  10  independent  trials,  and  Figure  26  plots  the  4  corresponding  slices  of 
each  algorithm  for  SR  =  0.20  and  T  =  50  in  the  second  and  third  row.  From  the  table,  we  can  see  that 
APG-MC  is  significantly  better  than  ADM-MC  in  all  cases. 
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7  Multispectral  Oil  Spill  Detection 

The  paper  [22]  presents  a  novel  application  of  sparse  reconstruction  for  offshore  oil  spill  sensing  based  on 
multispectral  measurements.  Early  detection  of  the  onset  of  oil  spills  provides  rapid  first  response  and 
more  time  to  develop  a  post-accident  strategy.  It  is  the  key  to  prevent  the  snowball  effect  where  multiple 
catastrophic  events  lead  to  a  major  accident  such  as  the  recent  disastrous  accident  of  British  Petroleum. 
The  current  solution  employs  routine  helicopter  fly-bys  that  rely  on  assisted  human  vision,  which  apparently 
leaves  out  much  useful  information  imperceptible  to  human  eyes.  Also,  these  fly-by  surveys  can  only  be 
performed  during  daytime  and  when  weather  permits.  In  [22],  we  propose  a  novel  video-rate  infrared  (IR) 
multispectral  imaging  system,  which  aims  to  replace  the  traditional  fly-by  survey  and  achieve  higher  accuracy 
at  lower  cost. 

The  proposed  imaging  system  is  shown  in  Figure  27,  which  is  based  on  a  similar  system  operating  in  the 
visible  wavelengths  that  was  developed  in  [23]  with  a  fixed  random  coded  mask.  First,  the  thermal  emission 
from  a  region  of  interest,  represented  by  fs(x,  y ,  A),  where  (x,  y )  is  the  spatial  coordinate  and  A  is  spectrum. 
It  is  first  demagnified  and  imaged  to  the  object  plane  of  a  telescopic  4f  system  /o(x,t/,  A),  and  then  passed 
through  a  reflective  pixelated  spatial  light  modulator  with  a  binary  or  greyscale  reflection  function  T(x,y). 
The  resulting  field,  T/o,  is  then  imaged  by  the  dispersive  4f  system  with  a  grating  placed  at  its  Fourier 
plane.  The  field  at  the  detector  plane  can  be  written  as  the  convolution  of  the  point  spread  function  of  the 
spectrograph  and  T/o  as 

/(x,  y,  A)  =  T(x,  y)f0{x ,  y,  A).  (23) 

Then  /  is  convolved  with  the  dispersive  4f  system.  Since  the  detector  array  sensitivity  covers  the  wavelength 
range  of  7-14/im,  the  measured  quantity  received  is  integrated  over  the  wavelengths: 

g(x,  y)  =  J  f(x  +  a(X  -  Ac),  y,  X)dX,  (24) 

where  a  is  the  linear  dispersion  of  the  grating  and  Ac  is  the  center  wavelength.  The  above  equation  serves 
as  the  general  imaging  model  in  the  continuous  spatial  domain.  By  recognizing  that  both  the  mask  and 
the  detector  array  are  in  fact  pixelated  and  measurement  noise  is  always  present,  (24)  can  be  rewritten  in  a 
matrix  form  as 

g  =  Hf  +  w.  (25) 

Our  goal  is  to  recover  /o  from  g  and  H  with  unknown  noise  w.  The  recovery  process  has  two  steps:  first, 
recover  /  from  g ;  then  recover  /o  by  inpainting  /. 

The  recovery  of  /  takes  advantages  of  the  fact  that  oil  spills  are  approximately  sparsely  composed  of  a 
small  number  of  spectrum  signatures  (shown  in  Fig.  28),  which  form  a  dictionary  <£>  =  (0 1, 02,  •  •  • ,  0j).  If  we 
define  a  new  matrix  :=  i7<£>  and  let  u  be  the  coefficients  of  the  combination,  then  we  have  H^u  =  H  f  =  g 
and  u  is  sparse.  To  find  a  sparse  u  such  that  H^u  =  g  we  use  the  following  l\  minimization  model: 

min  \u\i,  s.t.  H2U  =  g  and  0  <  u  <  1.  (26) 

The  next  task  is  essentially  an  image  inpainting  problem,  i.e.,  to  fill  in  the  unknown  region  in  the  image 
according  to  the  known  region.  We  skip  the  details  of  this  part. 

The  simulated  data  in  Fig.  29  is  a  lab  picture.  The  left  figure  shows  a  2D  representative  of  the  3D 
image  (/o(x,  y.  A)),  where  every  pixel  is  classified  as  one  of  three  different  oil  film  thickness  values  (10/im, 
50/im,  100/im)  and  surrounding  water.  Then,  the  mask  T(x,  y)  is  chosen  with  30%  random  nonzero  binary 
elements.  The  middle  sub  figure  shows  the  resulting  measurement  g ,  which  contains  Gaussian  random  noise 
(SNR:30).  The  right  subfigure  displays  the  recovered  result,  demonstrating  the  efficacy  of  the  proposed 
method.  Fig.  30  shows  the  performance  with  different  noise  levels  in  the  measurements,  and  with  randomly 
selected  25%  nonzeros  in  T.  Overall,  90%  pixels  are  correctly  classified  with  noise  less  than  5%. 


Learning  Circulant  Sensing  Kernels 

In  the  paper  [24],  we  are  motivated  by  [25]  and  propose  numerical  methods  to  minimize  ||(4>T)*(4>T)  —  I\\p 
to  improve  the  recoverability  of  the  basis  pursuit  problem 

min  1 1 0 1 1 1 ,  s.t.  <£>4/0  =  5,  (27) 
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or  the  denoising  model 


mjn||0||i  +  -\\^6  -b-  ar/\\l,  (28) 

0  p 

where  <F  is  either  a  full  or  partial  circulant  matrix,  T  is  either  an  analytic  or  learned  dictionary,  b  is  the 
measurement  vector  and  77  is  noise  with  noise  level  a.  Like  [25],  we  also  learn  <F  and  T  together  (namely, 
coupled  learning  of  <F  and  T),  rather  than  just  learning  <f>  for  a  given  T. 

Let  <f>  =  PG,  where  P  is  a  downsampling  matrix  selecting  m  out  of  n  rows  from  C  and  C  is  a  square 
circulant  matrix.  We  first  optimize  C  by  minimizing  ||  (GT)*(GT)  —  I\\f  for  a  given  T  and  then  P  by  mini¬ 
mizing  ||  (PGT)*(PGT)  —  I\\p  for  given  C  and  T.  Let  P  be  the  normalized  discrete  Fourier  transformation 
matrix.  Then  C  can  be  written  as  C  =  Pdiag(d)P*  for  some  d.  Using  this  fact,  the  problem  of  minimizing 
||(CT)*(CT)  —  I\\f  can  be  equivalently  transformed  to 

mm  ^xT Bx  —  xTdiag(P),  (29) 

where  x  =  [\di  |2, . . . ,  |dn|2]T,  B  =  P*TT*P  and  B  =  [|P^  |2].  In  addition,  the  problem  of  minimizing 
|| (PCT)*(PCT)  —  I\\f  can  be  tranformed  to 

min pT B[p  —  2fTp ,  s.t.  pi  =  m  and  pi  E  {0, 1},  Vi,  (30) 

i 

where  pi  =  1  if  P  selects  row  i  and  zero  otherwise,  H  =  [|Gij|2],  and  /  =  diag(G)  with  G  =  GTT*G*. 
To  get  a  real  valued  C,  we  can  make  d  conjugate  symmetric  by  confining  di  =  conj(cG)  for  every  i  and 
i'  =  mod(n  —  i  +  1,  n)  +  1  in  the  problem  (29). 

In  synthetic  tests,  we  use  YALL1  [26]  to  solve  (27)  with  Gaussian  random  basis  and  Fourier  basis  to 
compare  four  different  sensing  matrices:  rand-circ  (randomly  generated  complex  partial  circulant  matrix 
<f>),  Gaussian  (complex  Gaussian  random  matrix  <F),  opt-circ  (<F  =  PC  with  optimized  G  and  uniformly 
random  P)  and  opt-circ- and- P  (<F  =  PC  with  optimized  G  and  P).  Figure  31  depicts  the  success  rate  out 
of  50  independent  trials  (a  recovery  is  called  successful  if  the  relative  error  is  less  than  10-4).  Both  the  two 
subfigures  of  Figure  31  reveal  that  optimized  circulant  sensing  matrices  lead  to  equally  good  performance  as 
Gaussian  random  matrices.  For  the  random  basis,  random  circulant  matrices  achieve  similar  recovery  success 
rate,  while  they  perform  extremly  bad  for  the  Fourier  basis.  On  the  other  hand,  optimizing  the  selection 
matrix  P  hardly  makes  further  improvement.  We  believe  that  unless  the  underlying  signal  has  dominant 
frequencies,  optimizing  the  selection  matrix  P  will  not  lead  to  consistent  improvement. 
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(a)  First  image 


(b)  Second  image 


Figure  14:  Plots  of  the  first  band  for  both  images 
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(a)  3  endmembers  from  first  scene. 


(b)  4  endmembers  from  second  scene. 


Figure  15:  Locations  of  the  computed  endmembers. 
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Figure  16:  Spectral  signatures  of  the  7  endmember s 


Figure  17:  Angle  between  abundance  vectors  computed  using  7  endmembers. 
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Figure  18:  Averaged  spectral  signatures  of  the  7  endmember s 


Figure  19:  Angle  between  reduced  abundance  vectors  combining  the  5th  and  6th  components. 
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Figure  20:  Change  detection  result  after  applying  TV  regularization. 


Figure  21:  Target  image  “Color  wheel”  (left)  and  estimated  abundance  from  10%  measurements. 
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Figure  22:  Four  slices  computed  by  the  proposed  approach  (left)  and  by  TwIST  (right) 


Figure  23:  Matrix  completion  with  different  sample  rates  (SRs).  Left:  relative  error  in  Frobenious  norm; 
Right:  cpu  time  in  seconds.  The  algorithm  in  [10]  was  used  for  SR=100%.  Algorithm  [9]  was  used  for 
SR=70%,  50%,  25%.  All  tests  used  the  same  parameters  and  stopping  tolerances,  and  results  are  the 
averages  over  50  independent  trials 
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Figure  24:  Three  selected  slices  for  hyperspectral  data  cube 
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Figure  25:  Real  data:  the  first,  second  and  third  row  of  each  subfigure  are  recovered  images  by  Algorithm 
[9],  LMaFit,  and  FPCA,  respectively 


Figure  26:  Hyperspectral  data:  four  selected  slices  (first  row)  and  the  corresponding  recovered  slices  by 
APG-MC  (second  row)  and  ADM-MC  (third  row)  for  sample  ratio  SR  =  0.20  and  maximum  running  time 


T  =  50. 


Telescopic  4f 


Dispersive  4f 


Figure  27:  Proposed  imaging  system. 
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Figure  28:  Spectral  response  of  different  oil  film  thickness  and  water. 


Figure  29:  Simulated  results  (92%  recovered  correctly)  on  a  67  x  92  image  with  30%  nonzeros  in  T  and  noise 
(SNR=30)  added,  and  the  running  time  is  5  seconds. 


Figure  30:  The  percentage  of  correctly  recovered  pixels  for  different  SNR  values  with  25%  nonzeros  in  T. 
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Figure  31:  Comparison  results  of  four  different  sensing  matrices:  rand-circ ,  Gaussian ,  opt-circ  and  opt-circ- 
and-P  with  Gaussian  random  basis  (left)  and  Fourier  basis  (right). 


Figure  32:  Comparison  results  of  four  different  sensing  matrices:  rand-circ ,  Gaussian ,  opt-plus-rand-circ  and 
coupled- plus- rand- circ  on  Berkeley  segmentation  dataset  for  sampled  row  numbers  m  —  16  (left)  and  m  —  24 
(right). 
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For  real  image,  we  did  two  sets  of  tests.  In  the  first  set,  we  use  YALL1  to  solve  (28)  with  learned 
dictionaries  T  by  KSVD  method  [27]  using  20,000  randomly  extracted  8x8  patches  from  the  200  training 
images  in  the  Berkeley  segmentation  dataset  [28].  We  test  rand-circ ,  Gaussian  and  opt-plus-rand-circ  ( opt- 
circ  with  C  added  by  a  real  random  circulant),  as  well  as  coupled-plus-rand- circ  (<F  =  PC  with  uniformly 
random  P  and  C  the  sum  of  a  real  random  circulant  and  that  learned  simultaniously  with  its  corresponding 
dictionary  T)  on  600  uniformly  randomly  chosen  8x8  patches  from  the  100  testing  images  in  the  Berkeley 
segmentation  dataset.  Figure  32  depicts  the  mean  squared  error:  MSE  =  JT=1  \\xl  ~  II i/ (G4£) ,  where  xl 
is  the  ith  selected  patch,  £  is  the  number  of  tested  patches,  and  0l  was  the  solution  of  (28)  output  by  YALL1. 
All  results  are  averages  of  20  independent  trials.  Both  the  two  pictures  in  Figure  32  reveal  that  uncoupled 
and  coupled  learning  approaches  achieve  significantly  better  recovery  performance  over  random  circulant 
matrices,  and  they  are  even  better  than  Gaussian  random  matrices  when  m  —  16.  The  coupled  learning 
approach  makes  a  slightly  better  performance  over  the  uncoupled  one.  We  believe  that  coupled  learned 
circulant  matrices  will  not  make  a  large  improvement  over  uncoupled  learned  ones  unless  the  underlying 
signal  is  sparser  under  the  learned  dictionary. 

In  the  second  set  of  tests,  we  use  a  150  x  150  x  163  hyper  spectral  cube  and  learn  a  dictionary  from  all 
the  21025  overlapping  6x6  patches  extracted  from  the  first  slice  of  the  hyperspectral  cube,  which  is  shown 
in  the  left  side  of  Figure  33.  The  four  different  sensing  matrices  rand-circ ,  Gaussian ,  opt-plus-rand-circ  and 
coupled-plus-rand- circ  are  compared  on  500  uniformly  randomly  selected  patches  from  the  remaining  162 
slices.  Figure  34  depicts  the  average  MSEs  of  20  independent  trials,  and  Figure  35  plots  one  set  of  recovered 
slices  corresponding  to  the  150th  slice  shown  in  the  right  side  of  Figure  33.  Again,  the  learned  circulant 
sensing  matrices  make  greater  improvement  over  the  random  circulant  one  and  even  better  than  the  Gaussian 
random  one. 
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Figure  33:  Two  selected  slices  of  the  hyperspectral  data 


Figure  34:  Comparison  results  of  four  different  sensing  matrices:  rand-circ ,  Gaussian ,  opt-plus-rand-circ  and 
coupled-plus-rand- circ  on  hyperspectral  data  for  sampled  row  numbers  m  =  8  (left)  and  m  =  12  (right). 
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Figure  35:  One  set  of  recovered  slices  by  YALL1  with  rand-circ ,  Gaussian ,  opt-plus-rand-circ  and  coupled- 
plus-rand- circ  for  m  =  12 
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